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CONTEXTUAL CLASSIFICATION 
OF MULTISPECTRAL IMAGE DATA 

By 

P. H. Swain, S. B. Vardeman, J. C. Tilton 


ABSTRACT 

Compound decision theory is invoked to develop a 
model for classifying image data using spatial context. 
Methods for characterizing contextual information in 
an image are proposed and tested. Experimental results 
based on both simulated and real multispectral remote 
sensing data demonstrate the effectiveness of the con- 
textual classifier. A number of practical problems 
associated with this approach are discussed and pos- 
sible solutions are explored. 
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1 . INTRODUCTION 


Multlspttctral image data collected by remote sens-* 

Ing devices aboard aircraft and spacecraft are rela- 
tively complex data entities. Both the spatial attri- 
butes and spectral attributes of these data are knovm 
to be information bearing but to reduce the magni- 
tude of the computations involved, most analysis efforts 
have focused on one or the other. Only within the last 
few years have serious efforts been made to utilize 
them jointly. For example, one approach uses the spectral 
homogeneity of "objects,” such as agricultural fields, 
to segment the scene and then uses sample classification 
to assign each object as a whole, rather than its in** 
dividual pixels (picture elements) , to an appropriate 
ground cover class Another approach involves ex- 
traction of features based on gray- tone spatial-dependency 

matrices from which texture-like characteristics are 
( 3 ) 


developed 


In this paper we describe a more general way to ex- 
ploit the spatial/spectral context of a pixel to achieve 
accurate classification. Just as in written English 
one can expect to find certain letters occurring regu- 
larly in particular arrangements with other letters 
(qu, ee, est, tion ) , so certain classes of ground cover 
are likely to occur in the "context" of others. The 
former phenomenon has been used to improve character 
recognition accuracy in text reading machines. We shall 
demonstrate that the latter can be used to improve ac- 
curacy in classifying remote sensing data. Intuitively 
this should not be surprising since one can easily 
think of ground cover classes more likely to occur in 
some contexts than in others. One does not expect to 
find wheat growing in the midst of a housing subdivision, 
for example. A close-grown lush vegetative cover in 
such a location is more likely the turf of a law.i. 


2 . THE MODEL 

Consistent with the general characteristics of imag- 
ing systems for remote sensing, we assume a two-dimen- 
sional array of N = Nj^xN 2 pixels of fixed but unknown 
classification, as shown in Figure 1. 


Figture 1. A two-dimensional array of N « x N 2 pixels 
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Figure 2. Examples of p-context arrays. 
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Associated with the pixel having image coordinates 
(i,j) is its true state or true classification 


{all ,u) 2 , . . . and a random measurement vector 

n 


(observation) having class-conditional density 

p(Xij|0ij). We note that {p(xloj^), i«l,2,...#m} 
is the set of class-conditional probability density 
functions associating the multispectral measurement 
vector X with the classes. 

Let X denote a vector whose components are the ordered 
pixel measurement vectors: 

ip 

X = [X^ j I i*l , 2 , . . , j = lf2,...«N23 • 

Similarly, let 6 be the vector of states: 

iT 

0 — [0£j|x“®l,2/.«./N^j * 

The individual measurement vectors are assumed to be class- 
conditionally independent; that is, their joint density 
can be written as: 


p(Xlu) = .^P^J^ijl^ij)' 


( 1 ) 




Evidence that this is a reasonable assumption may be found 
(4) 

in reference 

Let the action (classification) ta)cen with respect to 
pixel (i,j) be denoted by The loss suffered by 

taking action a. . when the true class is 0. . is denoted by 

1 J 1 j 

L(o, ,a. .) for some fixed non-negative function L(.,.). Then 

13 X J 

the average loss suffered over the N classifications in 


tho array is 



If we make the action a^^ a function of the observations/ 
then for a given array 0 the expected average loss (or 
risk) is 

Where the expectation is with respect to the distrib ,' 

tion of the vector of observations. 

Our objective may be stated as follows; We want to 

determine the dependence of the decision function aj^j(*) 

on X in such a way that for any given array e, the risk, 

equation (2) , will be minimum. One way to approach the 

problem of making R. small is to view 6 as a realization 

0 — 

of a random process in two dimensions and to derive a de- 
cision rule which is Bayes versus this "prior distribu- 
tion" for e (probably under some simplifying assumptions 
concerning the nature of this process) . This is the 
approach of Welch and Salter^^^ and Yu^^^ , who make 
assumptions on the random process sufficient to guarantee 
that the Bayes decision concerning pixel (if j) depends 
on X only through X^^ and the four nearest neighbors of 
the pixel. 

We will adopt an approach to controlling through 
. ( • ) that is more closely related tc the large body 
of statistical literature traceable to Robbins ' , and 
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known as compound decision theory. See, for exai^le, 
the works and references of VanRyzin' ' Cover and 
Shenhar^^®^, and Vardeman^^^*^^^ . Rather than looking 
for a distribution for 6 whose associated Bayes rule is 
both simple and has small Rg for most 6, we use the fol* 
lowing argument. First, specify some arrangement of 
p pixel locations including a pixel to be classified. 

Call this arrangement the p-context array, several 
choices of which are shown in Figure 2. 

Let and X**e(r”)^ stand respectively for p-vectors 

of classes and n-dimensional measurements; each component 
of ^ is a variable which can take on values in Q; each 
component of is a random n**dimensional vector which 
can take on values in the observation space. Correspon- 
dence of the components of 6^^ and to the positions in 
the p-context array is fixed but arbitrary except that 
the pixel to be classified in the array will always 
correspond to the pth components. The notation 0^^ 
and will refer to the particular instance of ^ and 
associated with pixel (i,j). 

Now consider finding an optimal decision rule of the 

form 

ai-tX) - d(X^j) (3) 

for a fixed function d(*} mapping p-vectors of observa- 
tions to actions. The risk associated with any rule of 
this form is, from equation (2) , 
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Rp - E 


L ( 0 , . , d (X* , ) ) 




j 


L(e^yd(x^^)) 


,G{oP)ECL( 9 ,d(xP) ) ] 


qPj.j^P ■ ■* '-p- 


(4) 


where the context distribution , is the relative 

frequency with which ^ occurs in the array u and i ? 
the pth component of 6^. Notice that R depends on 0 
only through G(^**). Writing equation (4) in more detail 
and invoking the class''Condi tional independence assumption 
equation (1), we have 



where the product is over the components of x^^. l or 
any array £, a decision rule d(X^) minimizing k,. can ijc 
obtained by minimizing the integrand in cqueitaon (1) lo* 
each X^; thus for a specific X^^ (an instance of x’* > , 
optimal action is: 


d(X£j) * the action (classification) r; wi c i> 


I 


_G(0P)L(0 ,a) 7T p(xJo.). 
e^enP P i=i ^ ^ 
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This can be written in a slightly different form which 
makes more apparent the specific contribution due to 
context (the term in brackets below): 


d(X^j) B the action a which minimizes 



P 


In practice, a "0-1 loss function" is usually assumed, i.e., 

0, if 0 “ a 

L(0,a) » ' 

1, if 0 a 

Then (7) simplifies and the decision rule becomes: 



Thus (8) defines a set of discriminant functions for the 
classification problem. 

The optimal choice of d(*) cannot actually be deter- 
mined because it depends on G(O^) v/hich is unknown. 
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We can, however, expect that, at least for large 
N = X N 2 » a decision rule in which G(^^) is re- 

A. 

placed by an estimate G(£^) based on the will have 

Ak 

risk Rq approximating that of the optimal rule. (We 

call this the "bootstrap effect.") That this is the 

case when p = 1 tipproximating an optimal pointwise 

classifier with estimated a priori probabilities) 

and suitable forms of estimation are used is a con- 

(9) 

sequence of the work of VanRyzin . 

The notion of attempting to approximate the risk 

of the best rule of the form equation (3) for p> 1, 

given its first general treatment in Gilliland and 

Hannan^^^^, has not been as thoroughly studied as the 

p = 1 version. But related work for p > 1 in sequenc e 

(14) 

versions of compound decision theory suggests the 

( 12 ) 

validity of the generalization. Further, Varderaan 
points out that if one is willing to separate the N 

locations into several groups Gj^, G^, •••, within 

each of which the are independent, the results for 

p = 1 by VanRyzin guarantee that, for p> 1, replacing 

the G(^^) by estimates of the frequendes of group- 

by-group produces a decision procedure having the risk 

of the optimal rule as an approximate upper bound on 

its risk. An illustration of this separation idea is 

shown in Figure 3. 
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Figure 3. A 2-context array with separable pixel groups. 
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In tl» interest of « preetlcel solution the 
problem of inoorporeting contsxt into the classifica- 
tion procHBdurSf estimates of <3 (6^) %rare derived 
eai^^iBientally by sinply counting the occurrences of 
each obtained in a preliminary classification of 
the scene witlK>ut the use of context. Although the 
use of this rather crude method of estimating G(6^) 
has not been studied in the statistical literature « 
we will dfflK>nstrate in Section 3 its effectiveness 
for our application. 

Before proceeding to a discussion of our experi- 
mental results, we make tvo further observations con- 
cerning this approach. First, seeking a criterion for 
the "context richness" of a scene, we have been able 
to reach only the following result. Suppose the fre- 
quencies G{9^) are such that G(^) can be written in 
factored form, i.e., 

G(_eP) = Gj^(0').G2(e") 


where 0' and J9" are, respectively, p 

of classes, then (6) can be written 
P 

E L(9p.a)Tr P(Xj|ei)Gj(i") . E 

6 " i= 9 ’ 

p-i+1 


- £ and £ vectors 
in the form 

p-£ 

|]^p(X^l9^)Gj^(e') 
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But now the terms included in the second summation are 
independent of the conditions at the pixel to be clas- 
sified and are therefore constant^ relative to the de- 
cision to be made. Thus/ the decision depends only 
on I components of the p-context array and is inde- 
pendent of the other p-H- locations. If it were pos- 
sible to determine such factorability of the G(£^), 
one could simplify the context classification computa- 
tions by reducing the size of the context array. 

Second, comparing (7) with the results of Welch 
and Salter' and reinterpreting the G(^^) as the 
marginal of an a priori distribution for one may 
view (7) as a generalization of the Welch and Salter 
context classification rule. The advantages of the 
present formulation are that one need make no possibly 
unrealistic assumptions about the distribution for J0 
and has complete freedom to choose both p and the form 
of the p-context array. There are situations (e.g., 
locating clouds and their associated shadows in a scene) 
in which context arrays other than those involving 
neighboring pixels would be useful, a possibility 
unique to this approach. 
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3. EXPERIMENTAL RESULTS 

Experiments were performed to explore the effective- 
ness of contextual classification as applied to the 
analysis of multispectral remote sensing data. First, 
simulated data were used to determine the degree to 
which contextual classification might improve the ana- 
lysis results (as compared to no-context classification) , 
given that the class-conditional densities and the con- 
text distribution for the scene were known. The simu- 
lated data were used again to investigate candidate 
methods for estimating the context distribution since, 
as noted in Section 2, it usually cannot be assumed 
that the context distribution is known a priori . 

Finally, contextual classification was applied to real 
data to determine the extent to which the conclusions 
drawn from the simulated-data experiments could be 
extended to the more realistic case. 

Simulated Data Experiments 

A no-context classification of multispectral remote 
sensing data was selected which had been judged to be 
very accurate (produced by careful analysis and refine- 
ment of multitemporal data) . Such a classification could 
be expected to embody the contextual content of an actual 
ground scene. Using the classification mao and the 
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associated statistics of the classes (developed in per- 
forming the no-context classification ) , data vectors 
were produced by a Gaussian random number generator and 
composed into a new data set. Thus the new data set 
had the following characteristics: 

(1) Each pixel in the simulated data set represented 
the same class as in the "template" classification. 

The template could be considered the "ground truth" 
for the simulated data set. 

(2) All classes in the data set were known and represented. 

(3) All classes had multivariate Gaussian distributions 
with scatistics typical of those found in real data. 

(4) All pixels were class-conditionally independent of 
adjacent pixels. 

(5) There were no mixture pixels. 

Data simulated in this manner are somewhat of an ideali- 
zation of real remote sensing data, but the spatial or- 
ganization of the simulated data is consistent with a 
real world scene and the overall characteristics of the 
data are consistent with the contextual classifier model. 

In essence, then, the experimental results based on the 
simulated data demonstrate the effectiveness of the con- 
text classifier, given that the underlying assumptions 
are satisfied. Further experiments with real data are 
required to generalize the conclusions. 
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Three data sets were selected representing a variety 
of ground cover types and textures. Data set 1 was agri- 
cultural (Williston, North Dakota) , with ground resolu- 
tion and spectral bands approximating those of the pro- 
jected Landsat-D Thematic Mapper. Data set 2a was 
Landsat-1 data from an urban area (Grand Rapids, Michi- 
gan) . Data set 2b was from the same Landsat frame as 
2a, but from a locale having significantly different 
spatial organization. Each data set was square, 50 
pixels on a side. 

Figure 4 shows the classification results obtained. 
The "no-context" classification accuracy is plotted co- 
incident with the vertical axis of each graph. Data 
set 1 was classified using successively 0, 2, 4, 6 and 
8 neighboring pixels; data sets 2a and 2b were classified 
using 0, 2, 4 and 8 neighboring pixels. The accuracy im- 
provement resulting from the use of contextual information 
was found to be quite significant. 

To accomplish the context classification using this 
approach, it is necessary to have available the class- 
conditional density functions for the classes to ba 
recognized, p(x(a>^), and the context distribution (the 
fx'equency distribution associated with the p- vectors, 


Overall Performance (% correct) 
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G(d,^)). In remote sensing applications, the class- 
conditional density functions are typically learned 
from training samples. For the experiments described 
above, the Gaussian class statistics on which the data 
simulations were based were used for the classification 
(these were originally the training statistics used 
to produce the "template" classification) . An impor- 
tant question is how in practice to determine the con- 
text distribution. In the foregoing experiment, t);is 
distribution was simply tabulated from the "template" 
classification (actually, from an area somewhat larger 
than classified in this test) . But in a real data situ- 
ation, such a template is not available, else there would 
be no need to perform any further classification. 

One can envision a number of ways in which the con- 
text distribution might be estimated for a given remote 
sensing application. For example, it could be extracted 
from a classification of data obtained previously from 
the same area. This would require that the area not 
have changed much in its class make-up since the earlier 
data were collected and that the earlier classification 
was reasonably accurate. Alternatively, the distribution 
might be obtained from a classification of any similarly 
constituted area. Still another possibility would be to 
estimate the context distribution for the data to be 
classified from a "conventional" classification of the 
same data determined to have "reasonably good" accuracy. 
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Conceivably, one might then refine the contextual claasl- 
fication by making another estimate of the context distri- 
bution based on the resulting more accurate classification# 
and even iterate in this way until no further improven«nts 
in accuracy were obtained. All of these methods produce 
an estimate of the context distribution# and a crucial 
question on which hinges the utility of this contextual 
classification method is how sensitive the contextual 
algorithm is likely to be to the "goodness" of the estimate. 

The iterative technique starting with a no-context 
classifiction seemed to be the most practical approach# 
since no classifications are needed from earlier data 
or from other areas of similar context. All that is 
needed is a good initial point-by-point classification 
of the area in question. 

To test the potential of this "bootstrap" technique, 
it was first tried on the simulated data set 2a. Also# 
the classifications using the reference template were 
rerun using an estimate of the context distribution from 
just the 50-pixel-square area classified# rather than 
from the larger area (276 x 320) used to obtain the 
estimate for the results presented in Figure 4. This 
was done to provide a better comparison to what could be 
accomplished using the bootstrap technique. 

Using this approach# seven iterations (classifications 
followed by re-estimation of the context distribution) 
produced an improvement of 36 percent in overall ccuracy 


compared to the point classification using equal a priori 
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probabilities (from 52 percent to over 86 percent). No 
significant change was observed in average-by-class 
accuracy (constant at 68 percent).* This compares 
with an increase of over 44 percent in overall accuracy 
(28 percent in average-by-class accuracy) obtained using 
the context distribution estimated from the template 
classification. These results are summarized in 
Figure 5. 

As seen in Figure 5, a number of values of p were 
used in the iteration process. At each iteration, the 
best classification found by varying p. as judged by 
trading off overall accuracy against average-by-class 
accuracy, was used as the tenplate for re-estim.^ting the 
context distribution for the next iteration. 


* Classification performance can be tabulated in two 
ways. Overall accuracy is simply the overall number of 
correct classifications divided by the total number 
attempted. Average-by-class accuracy is obtained by 
first computing the accuracy for each class and taking 
the arithmetic average of the class accuracies. The 
latter is significant when the classification results 
exhibit a tendency to discriminate in favor of or against 
a subset of the classes. 
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Figure 5. Results of contextual classification 
using iteratively estimated context distribution 
(simulated data set 2a) . 
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Thc best classification on the first iteration was ob- 
tained for p • 3 (two nearest neighbors) , which was 
also the case for the second iteration. On the third 
iteration/ the p « 5 (four nearest neighbors) choice 
wa^ deemed best. Finally/ by the seventh iteration/ 
the p ■ 9 (eight nearest neighbors) choice was con- 
sidered best. In this case/ the overall accuracy was 
slightly less than for the p « 5 choice (88.2 percent 
versus 88.6 percent)/ but the average-by-class accuracy 
was better by a larger margin (68.1 percent versus 
67.4 percent). 

This impleiMntation of the bootstrap technique in- 
volves a larger number of classifications, usually three 
or more per iteration. A simpler approach would be to 
do just one classification per iteration and increase 
the number of nearest neighbors used for each iteration. 
As sho%m in Figure 6, for data set 2u t’le final result 
using this method was virtually the same as for the more 
involved procedure. 

It was wondered just how much of the accuracy improve- 
ment was due to a bet ter estimate of the point-by-point 
prior probabilities. After five iterations doing 
0-nearest-nci ghbor classification, the a nprovcirent in 
overall accuracy saturated at 80.3 percent, but the 
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Figure 6. Contextual classification results based on 
simplified iterative technique (simulated data set 2a). 
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average performance by class had degraded to 46.9 percent. 
This compares closely to the 0-nearest-neighbor classi- 
fication done using the context distribution determined 
from the reference template, which had an overall accuracy 
of 80.8 percent and an average performance by class of 
48.3 percent. It appears from this result that the con- 
text serves to improve the overall performance compared 
to chat of the 0-nearest-neighbor result while resisting 
degradation in average-by-class accuracy. 

Real Data Experiments 

Having observed excellent performance of the contextual 
classifier on simulated data, the next step was to see 
how well it would perform on real data. A 50-pixel-square 
segment of Landsat data was chosen which inc Jaded approxi- 
mately equal amounts of urban and agricultural area 
located to the southeast of Bloomington, Indiana, 
Statistics for the spectral classes were estimated using 
the 100-pixel-square area centered cn the 50-pixel-square 
segment. A very careful classif icat..on using 14 spectral 
classes was perforn^d to delineate agricultural, urban 
yrd forested areas. As there were too few forested 
pixels to delineate forest test areas reliably, the 
classification was tested only for accuracy in classify- 
ing the agricultural and urban classes. Out of the 2500 
pixels in the segment, a total of 867 pixels were manually 
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interpreted as agricultural and 450 pixels as urban. 

The identification was made by interpretation of color 
infrared photoaraohy taken by aircraft on the same day 
as the Landsat pass. 

The results from using the full bootstrap technique 
on this data set were not nearly as favorable as the 
results obtained from the simulated data. See Figure 7. 

The no-context classification using uniform prior 
probabilities had an overall accuracy of 83.1 percent 
and an averaqe-by-class accuracy of 82.7 percent. The 
best classification obtained using this result as a 
template to estimate the context distribution was a 
p = 2 (one-nearest-neighbor) classification based on 
the neighbor to the "north" (85.2 percent overall, 

84.7 percent average-by-class). Interestingly, the 
one-nearest-noighbor result based on the neighbor to 
the "west" produced a slightly poorer classification 
(84.2 percent overall, 83.8 percent average by class). 

No apparent features in the scene would account for the 
difference (i.e., oe seen by eye), raising a new issue 
yet to be pursued. 

The second iteration was performed using the one- 
nearest-neiqhbor (nort)i) classification from the first 


Overall Performance (% correct) 
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iteration for estimatinq the context distribution. 

Here the two-nearest-neiqhhor (neighbors to the "north" 
and "west") classification was the best with an overall 
accuracy of 85.2 percent and average-by-class accuracy 
of 84.7 percent). The best classification for the third 
iteration was again the one- nearest-neighbor (north) case 
with 85,3 percent overall accuracy and 84.8 percent 
average-by-class accuracy. The fourth iteration pro- 
duced no improvement. The context classifier thus onlv 
yielded just over two percent improvement in both over- 
all accuracy and average-by-ciass accuracy. 

In order to assess the sensitivity of these results 
to the accuracy of the template used to estimate the 
context distribution, a manual "cleanup" of the original 
template was performed, as follows: Change the classi- 

fication of all incorrectly classified pornts in the 
test areas in the original point-by-ooint uniform priors 
classification to the closest spectril class in the 
correct information class as observed by means of a 
cross-plot of Landsat bands 2 and 3. Where either of 
two spectral classes might have been the correct class, 
a coin was tossed to decide the assignment. The context 
distribution was then estimated from the entire modified 
classi ficaLion incl iding both test and non-test areas. 

The first iteration using this modified classification 
as template produced excellent results (Figure 8). The 
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p * 9 (eight-nearest-neighbor) classification produced 
an improvement of over 10 percent to 93.8 percent in 
overall accuracy and over 11 percent to 93.6 patient in 
average-by-class accuracy (compared to the conventional 
point classifier with uniform prior probabilities} . A 
second iteration was perforn^d using a context distri- 
bution estimated from a similarly modified eight-nearest- 
neighbors classification from the first iteration. No 
further improvement in accuracy was observed, suggest- 
ing that this iterative process "saturates" very quickly. 

Both the "full bootstrap" technique and the manual 
"cleanup" nethod were also applied to an agricultural 
Landsat data set from Kansas. The results were consis- 
tent with the results just described for the Bloomington 
data [16], The full bootstrap method netted only a two 
percent improvement in overall accuracy for an eight- 
near est-neighbors classification. The manual cleanup 
of the template classification led to a nine percent 
improvement (again for eight-nearest-neighbors) . 

The excellent results produced by using the context 
distribution estimated from the manually modified point 
classification suggest the following approach for clas- 
sification using context: 
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Figure 8. Performance using manual template 
correction for estimating the context distri- 
bution (Bloomington data) . 
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4 . SUMMARY AMD CONCLUSK^ 

The prc^osed model for a claealfier which utiliaea 
contextual information la a generalization of the fa- 
miliar maximum likelihood claaeifier. BxperiMntal re- 
sults based on simulated multivariable data have demon- 
strated that use of contextual information will signi- 
ficantly improve classification accuracy when the data 
satisfy the assumptions underlying the classifier model. 
Results for real data have shown that the obtainable 
accuracy improvement is dependent, as might be expected, 
on the accuracy with which the context distribution is 
known. Although satisfactory results have been achieved, 
it is clear that further work on ways to in^rove the 
context estimation will pay dividends. 

The computational demands presented by the contex- 
tual classifier are not inconsequential. Fundamentally, 
the time and space complexity of the method are propor- 
tioned to m^, where m is the number of classes and the 
context array (including the pixel to be classified) 
has p cells. Clever implementation schemes are helpful 
in reducing both the computation time and memory re- 
quired, but a more practical way to address the problem 
may be through the use of multiprocessor systems [15]. 
Measures of "context richness" of a scene would also 
allow for selective us of the contextual classifier 
only when significant benefits are likely to be obtained. 
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L GC0P)E[L{e ,d(xP))] 


(4) 


whertt G{j|P), th« context dietributlon « ia the relative 
frequency with which 0p occurs in the array and 6 is 
the pth <^q»onent of 0^. Notice that depends on e 
only thn>U9h G(0P) . fhritin? equation (4) in more detail 
and in^^kinq the clasfconditional independence assumption, 
equation (1) , we have 




(5) 


idiere the product is o^r tlw cca^ncnts of x^. For 
any array £, a decision rule d(xP) minimizing Rq can be 
obtained by niniiuizing the integrand in equation (5) for 
each thus for a specific Xj^^ {an instance of X^) , an 
optimal action is: 


d(Xij) ■ the action (classification) a which minimizes 


2- G(0P)L(0 ,a) tr p(x,|e.). 
©PcflP P 1-1 ^ ^ 


